ClassMate: A System for Automated Event Extraction from Course Websites
نویسندگان
چکیده
Websites contain a huge amount of time-critical data in highly unstructured and heterogeneous form. Information Extraction systems can extract relevant entities and relationships from these sites, and identify, classify and categorize them. In this paper, we present ClassMate, a complete system for extracting key course-related events from university course websites. ClassMate pipelines web data through a Named Entity Recognition module, an windowbased event extractor, and a KMeans clusteringbased classifier.
منابع مشابه
A pipeline to extract drug-adverse event pairs from multiple data sources
BACKGROUND Pharmacovigilance aims to uncover and understand harmful side-effects of drugs, termed adverse events (AEs). Although the current process of pharmacovigilance is very systematic, the increasing amount of information available in specialized health-related websites as well as the exponential growth in medical literature presents a unique opportunity to supplement traditional adverse e...
متن کاملAn Image-based Feature Extraction Approach for Phishing Website Detection
Phishing website creators and anti-phishing defenders are in an arms race. Cloning a website is fairly easy and can be automated by any junior programmer. Attempting to recognize numerous phishing links posted in the wild e.g. on social media sites or in email is a constant game of escalation. Automated phishing website detection systems need both speed and accuracy to win. We present a new met...
متن کاملDIADEM: Thousands of Websites to a Single Database
The web is overflowing with implicitly structured data, spread over hundreds of thousands of sites, hidden deep behind search forms, or siloed in marketplaces, only accessible as HTML. Automatic extraction of structured data at the scale of thousands of websites has long proven elusive, despite its central role in the “web of data”. Through an extensive evaluation spanning over 10000 web sites ...
متن کاملWEAVE: An Automated System for Collating Unstructured Data from WEB and Legacy Sources to Enhance the MRO Supply Chain
Gleaning consistent and complete data from multiple sources of unstructured information is often a difficult and time consuming process. In this paper we outline the WEAVE® system which automates the structuring and collating of unstructured data from multiple on-line Websites. WEAVE® is presented in the context of the maintenance, repair, and operations supply chain. The underlying knowledge r...
متن کاملWeave: an Automated System for Collating Unstructured Data
Gleaning consistent and complete data from multiple sources of unstructured information is often a difficult and time consuming process. In this paper we outline the WEAVE® system which automates the structuring and collating of unstructured data from multiple on-line Websites. WEAVE® is presented in the context of the maintenance, repair, and operations supply chain. The underlying knowledge r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008